This report summarizes the method used to create time series plots for all counties of California and the animated map of California for Covid-19 cases. The data used is obtained from the New York Times using the code provided. We filter the data for California counties only since the main focus of this report is California.
In order to do filter out all counties for California, we use the regular expression pattern “(state.California|California.state)” to grep the list containing the information of California. Then the geoid is used to grep all lists of California counties using the pattern “county.%s|county.%s”.
The data contains the name of the county, the date, and the total cases and deaths of a given date. The cases listed in the original data are cumulative cases. However, the resulting plot is designed to contain information about new cases, new deaths, and pro-rated values relative to a county’s population of a given date, thus suggesting further processing. We use a for loop to loop over the data to calculate desired quantities. The new cases and deaths are calculated by subtracting the cases of the previous date. For pro-rated values relative to the population, we decided to calculate cases per 1000 population and deaths per 1000 population. Then we generate a data frame.
The resulting plot shows hover text which contains the name of the county, the date, new cases and deaths, pro-rated values, population, total cases, and total deaths. The hover text is formatted before plotting. We create a new variable called “Text” in the data frame to store the hover text. An example of formatted hover text is
## [1] "Alameda <br>Date: 2020-03-01 <br>New Cases: 1 <br>New Deaths: 0 <br>Cases Per Thousand: 0.00 <br>Deaths Per Thousand: 0.00 <br>Population: 1671329 <br>Cases: 1 <br>Deaths: 0"
We use ggplot to create the time series plots for all counties by using counties as groups, and then we use ggplotly to convert ggplot object to an interactive plot. For the interactive plot, we want it to allow the viewer to hide all lines and then select the counties of their interests and also be able to highlight counties.
The ggplotly provide APIs that can turn a line off by clicking the legend. We found that when the viewer clicks on the legend, it triggers an event that changes the “visible” argument of that line to “legendonly”. We used this mechanism to create buttons that can hide and show all lines. For the button used to hide all lines, it changes the “visible” argument of all lines to “legnedonly” once the viewer clicks the button. Similarly, the button to show all lines change the argument to “T” for all lines.
For the highlight feature, plotly provide a highlight function that changes all lines’ opacities besides the selected line. We set the highlight_key function to create a sharedata object for the highlight function. Highlighting is set to be active when the viewer clicks on the line to avoid the conflicts with the hover function and inactive with double clicks.
For the animated map, we obtained the geojosn file from plotly to create polygons for every county and use the function from plotly to create the map. Then we obtain the ids of every county from the geojason file and add it as a variable to the data frame since the plot_ly requires the geoids for the locations of the polygons.
The animation part is tricky. We have tried two different ways for the animation. The first one is to create a list of all maps for all dates and a list of steps to control the order of displaying all maps by modifying the “visible” arguments with slider API from plotly. However, it requires a custom button and codes to start the animation. Therefore, we go by the animation function provided by plotly due to the limit of time. We set the frame of the animation to dates which are each step in the animation. Besides, the map will be extremely large, if we create the animation with every date in the data set. We subset the data set for every 10 days to reduce the size and the we always check to include the last date.
The final result is shown above. The bscols function is used to combine the time series plot and time series plot. The final plot achieved the most desired functions. The viewer can click on every line of the time series plot to highlight and disable it by double clicks. The viewer can use the buttons to hide and show all lines. Both maps and time series plots have detailed hover information.
There are still potential improvements for this project. One problem that I have encountered is the initial map of the animation, the animation function from plotly does not support the setting of the initial frame. One easy solution is to deep clone all data from the latest date and add it as the initial frame. However, this is not an ideal solution, since the animation will start at that date and there will always be an initial value in the slider. A better solution is to use the first method mentioned above for the animation and set the active value of the slider to the last date in the data set. Also, it will be ideal, if the viewer can select the county of their interest on the map and have all lines hide except the selected county. This could be achieved by obtaining the county name using js code and then change the “visible” argument to “F” except for the line of selected county.
I have trouble making these work in limited time. Nevertheless, this is a very interesting project, and I would like to continue working on this project. What’s more I would try to find a better solution for the animation with a better base map and try to do this for more states.
Examples and documents from plotly website were referenced for this project. url: https://plotly.com/r/